library(glmmTMB) #for model fitting
library(car) #for regression diagnostics
library(broom) #for tidy output
library(ggfortify) #for model diagnostics
library(DHARMa) #for residual diagnostics
library(see) #for plotting residuals
library(knitr) #for kable
library(effects) #for partial effects plots
library(ggeffects) #for partial effects plots
library(emmeans) #for estimating marginal means
library(modelr) #for auxillary modelling functions
library(tidyverse) #for data wrangling
library(lindia) #for diagnostics of lm and glm
library(performance) #for residuals diagnostics
library(sjPlot) #for outputs
library(report) #for reporting methods/results
library(easystats) #framework for stats, modelling and visualisation
library(MASS) #for negative binomials
library(MuMIn) #for AICc
library(patchwork) #for figure layoutsGLM Part6
1 Preparations
Load the necessary libraries
2 Scenario
An ecologist studying a rocky shore at Phillip Island, in southeastern Australia, was interested in how clumps of intertidal mussels are maintained (Quinn 1988). In particular, he wanted to know how densities of adult mussels affected recruitment of young individuals from the plankton. As with most marine invertebrates, recruitment is highly patchy in time, so he expected to find seasonal variation, and the interaction between season and density - whether effects of adult mussel density vary across seasons - was the aspect of most interest.
The data were collected from four seasons, and with two densities of adult mussels. The experiment consisted of clumps of adult mussels attached to the rocks. These clumps were then brought back to the laboratory, and the number of baby mussels recorded. There were 3-6 replicate clumps for each density and season combination.
| SEASON | DENSITY | RECRUITS | SQRTRECRUITS | GROUP |
|---|---|---|---|---|
| Spring | Low | 15 | 3.87 | SpringLow |
| .. | .. | .. | .. | .. |
| Spring | High | 11 | 3.32 | SpringHigh |
| .. | .. | .. | .. | .. |
| Summer | Low | 21 | 4.58 | SummerLow |
| .. | .. | .. | .. | .. |
| Summer | High | 34 | 5.83 | SummerHigh |
| .. | .. | .. | .. | .. |
| Autumn | Low | 14 | 3.74 | AutumnLow |
| .. | .. | .. | .. | .. |
| SEASON | Categorical listing of Season in which mussel clumps were collected independent variable |
| DENSITY | Categorical listing of the density of mussels within mussel clump independent variable |
| RECRUITS | The number of mussel recruits response variable |
| SQRTRECRUITS | Square root transformation of RECRUITS - needed to meet the test assumptions |
| GROUPS | Categorical listing of Season/Density combinations - used for checking ANOVA assumptions |
3 Read in the data
Rows: 42 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): SEASON, DENSITY, GROUP
dbl (2): RECRUITS, SQRTRECRUITS
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Since we intend to model both SEASON and DENSITY as categorical variables, we need to explicitly declare them as factors.
Rows: 42
Columns: 5
$ SEASON <fct> Spring, Spring, Spring, Spring, Spring, Spring, Spring, S…
$ DENSITY <fct> Low, Low, Low, Low, Low, High, High, High, High, High, Hi…
$ RECRUITS <dbl> 15, 10, 13, 13, 5, 11, 10, 15, 10, 13, 1, 21, 31, 21, 18,…
$ SQRTRECRUITS <dbl> 3.872983, 3.162278, 3.605551, 3.605551, 2.236068, 3.31662…
$ GROUP <chr> "SpringLow", "SpringLow", "SpringLow", "SpringLow", "Spri…
# A tibble: 6 × 5
SEASON DENSITY RECRUITS SQRTRECRUITS GROUP
<fct> <fct> <dbl> <dbl> <chr>
1 Spring Low 15 3.87 SpringLow
2 Spring Low 10 3.16 SpringLow
3 Spring Low 13 3.61 SpringLow
4 Spring Low 13 3.61 SpringLow
5 Spring Low 5 2.24 SpringLow
6 Spring High 11 3.32 SpringHigh
tibble [42 × 5] (S3: tbl_df/tbl/data.frame)
$ SEASON : Factor w/ 4 levels "Spring","Summer",..: 1 1 1 1 1 1 1 1 1 1 ...
$ DENSITY : Factor w/ 2 levels "High","Low": 2 2 2 2 2 1 1 1 1 1 ...
$ RECRUITS : num [1:42] 15 10 13 13 5 11 10 15 10 13 ...
$ SQRTRECRUITS: num [1:42] 3.87 3.16 3.61 3.61 2.24 ...
$ GROUP : chr [1:42] "SpringLow" "SpringLow" "SpringLow" "SpringLow" ...
quinn (42 rows and 5 variables, 5 shown)
ID | Name | Type | Missings | Values | N
---+--------------+-------------+----------+------------+-----------
1 | SEASON | categorical | 0 (0.0%) | Spring | 11 (26.2%)
| | | | Summer | 12 (28.6%)
| | | | Autumn | 10 (23.8%)
| | | | Winter | 9 (21.4%)
---+--------------+-------------+----------+------------+-----------
2 | DENSITY | categorical | 0 (0.0%) | High | 24 (57.1%)
| | | | Low | 18 (42.9%)
---+--------------+-------------+----------+------------+-----------
3 | RECRUITS | numeric | 0 (0.0%) | [0, 69] | 42
---+--------------+-------------+----------+------------+-----------
4 | SQRTRECRUITS | numeric | 0 (0.0%) | [0, 8.31] | 42
---+--------------+-------------+----------+------------+-----------
5 | GROUP | character | 0 (0.0%) | AutumnHigh | 6 (14.3%)
| | | | AutumnLow | 4 ( 9.5%)
| | | | SpringHigh | 6 (14.3%)
| | | | SpringLow | 5 (11.9%)
| | | | SummerHigh | 6 (14.3%)
| | | | SummerLow | 6 (14.3%)
| | | | WinterHigh | 6 (14.3%)
| | | | WinterLow | 3 ( 7.1%)
--------------------------------------------------------------------
4 Exploratory data analysis
Model formula: \[ \begin{align} y_i &\sim{} \mathcal{Pois}(\lambda_i)\\ ln(\mu_i) &= \boldsymbol{\beta} \bf{X_i}\\[1em] \end{align} \]
where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the intercept and effects of season, density and their interaction on mussel recruitment.
Conclusions:
- there is clear evidence of non-homogeneity of variance
- specifically, there is evidence that the variance is related to the mean in that boxplots that are lower on the y-axis (low mean) also have lower variance (shorter boxplots)
- this might be expected for count data and we might consider that a Poisson distribution (which assumes that mean and variance are equal - and thus related in a very specific way).
Lets mimic the effect of using a log link, by using log scaled y-axis.
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
Conclusions:
- that is an improvement
5 Fit the model
We actually have a couple of initial options:
log transform the response variable (RECRUITS) and fit against a Gaussian distribution. If we do so, we would need to amend the response for cases when the response is zero (as the log of 0 is not defined). We could simply add 1 to each count before log transformed.
fit against a Poisson distribution with a log link
6 Model validation
Conclusions:
- most of these diagnostics seem ok, however, there is one observation (#34) that has a very high residual (and thus Cook’s D)
Model has interaction terms. VIFs might be inflated.
You may check multicollinearity among predictors of a model without
interaction terms.
# Overdispersion test
dispersion ratio = 3.309
Pearson's Chi-Squared = 112.497
p-value = < 0.001
Overdispersion detected.
# Check for zero-inflation
Observed zeros: 2
Predicted zeros: 0
Ratio: 0.00
Model is underfitting zeros (probable zero-inflation).
Conclusions:
- there is evidence of over-dispersion
$uniformity
Asymptotic one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.19018, p-value = 0.09584
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 3.1726, p-value < 2.2e-16
alternative hypothesis: two.sided
$outliers
DHARMa bootstrapped outlier test
data: simulationOutput
outliers at both margin(s) = 4, observations = 42, p-value < 2.2e-16
alternative hypothesis: two.sided
percent confidence interval:
0.00000000 0.04761905
sample estimates:
outlier frequency (expected: 0.00928571428571429 )
0.0952381
$uniformity
Asymptotic one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.19018, p-value = 0.09584
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 3.1726, p-value < 2.2e-16
alternative hypothesis: two.sided
$outliers
DHARMa bootstrapped outlier test
data: simulationOutput
outliers at both margin(s) = 4, observations = 42, p-value < 2.2e-16
alternative hypothesis: two.sided
percent confidence interval:
0.00000000 0.04761905
sample estimates:
outlier frequency (expected: 0.00928571428571429 )
0.0952381
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 3.1726, p-value < 2.2e-16
alternative hypothesis: two.sided
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = 7.6923, p-value = 0.04
alternative hypothesis: two.sided
Conclusions:
- there is evidence of over-dispersion and outliers
- there is also some evidence of zero-inflation
[1] 2.457479e-12
[1] 3.677366
Conclusions:
- there is evidence of over-dispersion
quinn %>% group_by(SEASON, DENSITY) %>%
summarise(Zeros = sum(RECRUITS==0),
Prop = Zeros/n(),
Mean = mean(RECRUITS))`summarise()` has grouped output by 'SEASON'. You can override using the
`.groups` argument.
# A tibble: 8 × 5
# Groups: SEASON [4]
SEASON DENSITY Zeros Prop Mean
<fct> <fct> <int> <dbl> <dbl>
1 Spring High 0 0 10
2 Spring Low 0 0 11.2
3 Summer High 0 0 48.2
4 Summer Low 0 0 22
5 Autumn High 0 0 19.7
6 Autumn Low 0 0 18.2
7 Winter High 0 0 5.67
8 Winter Low 2 0.667 2.67
FALSE TRUE
0.92903 0.07097
## OR, over the entire data
## is this due to excessive zeros (zero-inflation)
tab <- table(quinn$RECRUITS == 0)
tab/sum(tab)
FALSE TRUE
0.95238095 0.04761905
## 5% is not many.. so it cant be zero-inflated
## how many 0's would we expect from a poisson distribution with a mean similar to our mean
mean(quinn$RECRUITS)[1] 18.33333
FALSE
1
Conclusion:
- although at the level of the entire dataset, there is no evidence for excessive zeros, if we explore this separately within each SEASON by DENSITY combination, we see that for the Winter/Low density group, the proportion of zeros is very high.
7 Different model
In light of the discovery that the Poisson model was over-dispersed, we will explore a negative binomial model. Rather than assume that the variance is equal to the mean (dispersion of 1), the dispersion parameter is estimated. Negative binomial models are also able to account for some level of zero-inflation.
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.11213, p-value = 0.626
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.78026, p-value = 0.624
alternative hypothesis: two.sided
$outliers
DHARMa bootstrapped outlier test
data: simulationOutput
outliers at both margin(s) = 1, observations = 42, p-value = 0.74
alternative hypothesis: two.sided
percent confidence interval:
0.00000000 0.04761905
sample estimates:
outlier frequency (expected: 0.010952380952381 )
0.02380952
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.11213, p-value = 0.626
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.78026, p-value = 0.624
alternative hypothesis: two.sided
$outliers
DHARMa bootstrapped outlier test
data: simulationOutput
outliers at both margin(s) = 1, observations = 42, p-value = 0.74
alternative hypothesis: two.sided
percent confidence interval:
0.00000000 0.04761905
sample estimates:
outlier frequency (expected: 0.010952380952381 )
0.02380952
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.78026, p-value = 0.624
alternative hypothesis: two.sided
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = 6.1728, p-value = 0.104
alternative hypothesis: two.sided
Conclusions:
- there is no evidence that the negative binomial model is over-dispersed or zero-inflated
- all other diagnostics are also improved
8 Partial plots
9 Model investigation / hypothesis testing
Lets start with the estimated coefficients on the link (log) scale
Call:
MASS::glm.nb(formula = RECRUITS ~ DENSITY * SEASON, data = quinn,
init.theta = 9.022467857, link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.3026 0.1875 12.283 < 2e-16 ***
DENSITYLow 0.1133 0.2742 0.413 0.67934
SEASONSummer 1.5721 0.2389 6.581 4.69e-11 ***
SEASONAutumn 0.6763 0.2492 2.714 0.00664 **
SEASONWinter -0.5680 0.2881 -1.971 0.04870 *
DENSITYLow:SEASONSummer -0.8970 0.3509 -2.556 0.01059 *
DENSITYLow:SEASONAutumn -0.1881 0.3788 -0.496 0.61955
DENSITYLow:SEASONWinter -0.8671 0.5338 -1.624 0.10432
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(9.0225) family taken to be 1)
Null deviance: 183.269 on 41 degrees of freedom
Residual deviance: 54.883 on 34 degrees of freedom
AIC: 293.09
Number of Fisher Scoring iterations: 1
Theta: 9.02
Std. Err.: 3.69
2 x log-likelihood: -275.095
Conclusions:
- the intercept represents the estimated mean of the first combination of Season (Spring) and Density (High). On the link scale this is 2.3
- the difference between Low and High adult density in spring is 0.11, although this is not significant
- the difference between Spring and Summer for High adult density is 1.57
- the difference between Spring and Autumn for High adult density is 0.68
- the difference between Spring and Winter for High adult density is -0.57
- if there were no interactions, we would expect the Low density Summer recruitment to be the additive of the main effects (Low and Summer). However, the modelled mean is 0.9 less than the additive effects would have expected. This value is significantly different to 0, indicating that there is evidence that the density effect in Summer is different to that in Spring.
- the density effect in Autumn and Winter were not found to be significantly different from what you would expect from an additive model.
# A tibble: 8 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.30 0.187 12.3 1.11e-34 1.94 2.67
2 DENSITYLow 0.113 0.274 0.413 6.79e- 1 -0.424 0.652
3 SEASONSummer 1.57 0.239 6.58 4.69e-11 1.11 2.04
4 SEASONAutumn 0.676 0.249 2.71 6.64e- 3 0.189 1.17
5 SEASONWinter -0.568 0.288 -1.97 4.87e- 2 -1.14 -0.00675
6 DENSITYLow:SEASONSum… -0.897 0.351 -2.56 1.06e- 2 -1.59 -0.209
7 DENSITYLow:SEASONAut… -0.188 0.379 -0.496 6.20e- 1 -0.930 0.556
8 DENSITYLow:SEASONWin… -0.867 0.534 -1.62 1.04e- 1 -1.95 0.154
Now if we exponentiate to put these estimates on the response scale:
# A tibble: 8 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 10.0 0.187 12.3 1.11e-34 6.93 14.5
2 DENSITYLow 1.12 0.274 0.413 6.79e- 1 0.654 1.92
3 SEASONSummer 4.82 0.239 6.58 4.69e-11 3.02 7.72
4 SEASONAutumn 1.97 0.249 2.71 6.64e- 3 1.21 3.21
5 SEASONWinter 0.567 0.288 -1.97 4.87e- 2 0.320 0.993
6 DENSITYLow:SEASONSum… 0.408 0.351 -2.56 1.06e- 2 0.205 0.811
7 DENSITYLow:SEASONAut… 0.829 0.379 -0.496 6.20e- 1 0.394 1.74
8 DENSITYLow:SEASONWin… 0.420 0.534 -1.62 1.04e- 1 0.142 1.17
Conclusions:
- the intercept represents the estimated mean of the first combination of Season (Spring) and Density (High). On the response scale this is 10
- there is a 0.11 fold difference between High and Low adult density in spring, although this is not significant
- there is a 1.57 fold difference between Summer and Spring for High adult density
- there is a 0.68 fold difference between Autumn and Spring for High adult density
- there is a -0.57 fold difference between Winter and Spring for High adult density
- the modelled mean for Summer Low density is 0.9 fold different (less than half) that we would have expected in the absence of interactions
- the density effect in Autumn and Winter were not found to be significantly different from what you would expect from an additive model.
If we compare the above to the over-dispersed Poisson, we see that the estimates are the same, but that the standard errors are underestimated and hence the confidence intervals are narrower (for over-dispersed model)
# A tibble: 8 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 10.0 0.129 17.8 3.73e-71 7.68 12.7
2 DENSITYLow 1.12 0.186 0.610 5.42e- 1 0.777 1.61
3 SEASONSummer 4.82 0.142 11.1 1.55e-28 3.68 6.42
4 SEASONAutumn 1.97 0.159 4.27 1.99e- 5 1.45 2.70
5 SEASONWinter 0.567 0.215 -2.65 8.15e- 3 0.368 0.857
6 DENSITYLow:SEASONSum… 0.408 0.213 -4.20 2.64e- 5 0.268 0.620
7 DENSITYLow:SEASONAut… 0.829 0.238 -0.790 4.30e- 1 0.519 1.32
8 DENSITYLow:SEASONWin… 0.420 0.435 -1.99 4.61e- 2 0.169 0.945
10 Predictions
In order to tease apart the significant interaction(s), it might be useful to explore the effect of Density separately within each Season.
SEASON = Spring:
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
High - Low -0.1133 0.274 Inf -0.651 0.424 -0.413 0.6793
SEASON = Summer:
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
High - Low 0.7836 0.219 Inf 0.354 1.213 3.577 0.0003
SEASON = Autumn:
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
High - Low 0.0748 0.261 Inf -0.438 0.587 0.286 0.7749
SEASON = Winter:
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
High - Low 0.7538 0.458 Inf -0.144 1.652 1.646 0.0999
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Conclusions:
- there is no evidence of an effect of Density in either Spring, Autumn or Winter
- in Summer, recruitment is 0.78 higher (on a log scale) in High Density populations than Low Density populations.
Or we could express these on a factor (fold/ratio) scale.
SEASON = Spring:
contrast ratio SE df null z.ratio p.value
High / Low 0.893 0.245 Inf 1 -0.413 0.6793
SEASON = Summer:
contrast ratio SE df null z.ratio p.value
High / Low 2.189 0.480 Inf 1 3.577 0.0003
SEASON = Autumn:
contrast ratio SE df null z.ratio p.value
High / Low 1.078 0.282 Inf 1 0.286 0.7749
SEASON = Winter:
contrast ratio SE df null z.ratio p.value
High / Low 2.125 0.973 Inf 1 1.646 0.0999
Tests are performed on the log scale
Or perhaps even an absolute difference scale.
SEASON = Spring:
contrast estimate SE df z.ratio p.value
High - Low -1.20 2.92 Inf -0.411 0.6812
SEASON = Summer:
contrast estimate SE df z.ratio p.value
High - Low 26.17 7.97 Inf 3.284 0.0010
SEASON = Autumn:
contrast estimate SE df z.ratio p.value
High - Low 1.42 4.92 Inf 0.288 0.7734
SEASON = Winter:
contrast estimate SE df z.ratio p.value
High - Low 3.00 1.64 Inf 1.829 0.0673
Conclusions:
- there is no evidence of an effect of Density in either Spring, Autumn or Winter
- in Summer, recruitment is 2.19 fold higher in High Density populations than Low Density populations.
SEASON = Spring:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
High / Low 0.893 0.245 Inf 0.522 1.53 1 -0.413 0.6793
SEASON = Summer:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
High / Low 2.189 0.480 Inf 1.425 3.36 1 3.577 0.0003
SEASON = Autumn:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
High / Low 1.078 0.282 Inf 0.646 1.80 1 0.286 0.7749
SEASON = Winter:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
High / Low 2.125 0.973 Inf 0.866 5.22 1 1.646 0.0999
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Tests are performed on the log scale
It might be useful to capture these pairwise contrasts so that we can graph them as a caterpillar plot.
In the following, I will present the effects on a log (based 2) axis. Such a scale presents doubling and halving (etc) equidistant from 1.
eff <- quinn.nb %>%
emmeans(~DENSITY|SEASON, type='response') %>%
pairs() %>%
summary(infer=TRUE) %>%
as.data.frame
eff %>%
ggplot(aes(y=ratio, x=SEASON)) +
geom_pointrange(aes(ymin=asymp.LCL, ymax=asymp.UCL)) +
geom_text(aes(label=sprintf("p=%.2f", p.value)), nudge_x=0.25) +
geom_hline(yintercept=1, linetype='dashed') +
scale_x_discrete(name='') +
scale_y_continuous(name='Density effect (High vs Low)', trans=scales::log2_trans(),
breaks=scales::breaks_log(base=2)) +
coord_flip(ylim=c(0.25, 4)) +
theme_classic()11 Summary figures
As these summarise only involve categorical predictors, there is no need to define a prediction grid. For categorical predictors, the default grid will assume that you are interested in all the levels of the categorical predictors.
DENSITY SEASON response SE df asymp.LCL asymp.UCL
High Spring 10.00000 1.874542 Inf 6.92530 14.43980
Low Spring 11.20000 2.240673 Inf 7.56705 16.57713
High Summer 48.16667 7.133321 Inf 36.03185 64.38826
Low Summer 22.00000 3.550677 Inf 16.03406 30.18574
High Autumn 19.66667 3.228389 Inf 14.25612 27.13064
Low Autumn 18.25000 3.713650 Inf 12.24768 27.19392
Confidence level used: 0.95
Intervals are back-transformed from the log scale
ggplot(newdata, aes(y = response, x = SEASON, fill = DENSITY)) +
geom_pointrange(aes(ymin = asymp.LCL, ymax = asymp.UCL, shape = DENSITY),
position = position_dodge(width = 0.2)) +
theme_classic() +
theme(axis.title.x = element_blank(),
legend.position = c(0.01,1), legend.justification = c(0,1)) +
annotate(geom = 'text', x = 'Summer', y = 70, label = '*', size = 7) +
scale_shape_manual(values = c(21, 22))12 Zero-inflated model
Unfortunately, DHARMa and performance do not work with zeroinf() models.
library(pscl)
library(glmmTMB)
quinn.zip <- zeroinfl(RECRUITS ~ DENSITY*SEASON | 1, data=quinn, dist='poisson')
quinn.zip <- glmmTMB(RECRUITS ~ DENSITY*SEASON, zi=~1, data=quinn, family=poisson())
quinn.resid <- quinn.zip %>% simulateResiduals(plot=TRUE)
## The following does not work due to a lack of pearson residuals for glmmTMB
## quinn.zip %>% performance::check_overdispersion()
quinn.zip %>% summary() Family: poisson ( log )
Formula: RECRUITS ~ DENSITY * SEASON
Zero inflation: ~1
Data: quinn
AIC BIC logLik deviance df.resid
320.6 336.2 -151.3 302.6 33
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.3026 0.1291 17.836 < 2e-16 ***
DENSITYLow 0.1133 0.1858 0.610 0.54191
SEASONSummer 1.5721 0.1419 11.081 < 2e-16 ***
SEASONAutumn 0.6763 0.1586 4.266 1.99e-05 ***
SEASONWinter -0.5680 0.2147 -2.646 0.00815 **
DENSITYLow:SEASONSummer -0.8970 0.2134 -4.202 2.64e-05 ***
DENSITYLow:SEASONAutumn -0.1881 0.2381 -0.790 0.42957
DENSITYLow:SEASONWinter 0.2165 0.4534 0.478 0.63300
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.0037 0.7305 -4.112 3.92e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] 0.0496032
## quinn.zip1 <- zeroinfl(RECRUITS ~ DENSITY*SEASON | SEASON, data=quinn, dist='poisson')
quinn.zip <- glmmTMB(RECRUITS ~ DENSITY*SEASON, zi=~1, data=quinn, family=poisson())
summary(quinn.zip) Family: poisson ( log )
Formula: RECRUITS ~ DENSITY * SEASON
Zero inflation: ~1
Data: quinn
AIC BIC logLik deviance df.resid
320.6 336.2 -151.3 302.6 33
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.3026 0.1291 17.836 < 2e-16 ***
DENSITYLow 0.1133 0.1858 0.610 0.54191
SEASONSummer 1.5721 0.1419 11.081 < 2e-16 ***
SEASONAutumn 0.6763 0.1586 4.266 1.99e-05 ***
SEASONWinter -0.5680 0.2147 -2.646 0.00815 **
DENSITYLow:SEASONSummer -0.8970 0.2134 -4.202 2.64e-05 ***
DENSITYLow:SEASONAutumn -0.1881 0.2381 -0.790 0.42957
DENSITYLow:SEASONWinter 0.2165 0.4534 0.478 0.63300
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.0037 0.7305 -4.112 3.92e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = 0.98232, p-value = 1
alternative hypothesis: two.sided
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.4123, p-value = 0.512
alternative hypothesis: two.sided
quinn.zip1 <- glmmTMB(RECRUITS ~ DENSITY*SEASON, zi=~SEASON, data=quinn, family=poisson())
quinn.resid <- quinn.zip1 %>% simulateResiduals(plot=TRUE) Family: poisson ( log )
Formula: RECRUITS ~ DENSITY * SEASON
Zero inflation: ~SEASON
Data: quinn
AIC BIC logLik deviance df.resid
320.0 340.9 -148.0 296.0 30
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.3026 0.1291 17.836 < 2e-16 ***
DENSITYLow 0.1133 0.1858 0.610 0.54191
SEASONSummer 1.5721 0.1419 11.081 < 2e-16 ***
SEASONAutumn 0.6763 0.1586 4.266 1.99e-05 ***
SEASONWinter -0.5680 0.2147 -2.646 0.00815 **
DENSITYLow:SEASONSummer -0.8970 0.2134 -4.202 2.64e-05 ***
DENSITYLow:SEASONAutumn -0.1881 0.2381 -0.790 0.42957
DENSITYLow:SEASONWinter 0.2291 0.4375 0.524 0.60045
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -20.468 8393.486 -0.002 0.998
SEASONSummer -3.275 42171.914 0.000 1.000
SEASONAutumn -3.527 52021.072 0.000 1.000
SEASONWinter 19.214 8393.486 0.002 0.998
[1] 1.290805e-09
[1] 0.2220085
## quinn.zinb <- zeroinfl(RECRUITS ~ DENSITY*SEASON | 1, data=quinn, dist='negbin')
quinn.zinb <- glmmTMB(RECRUITS ~ DENSITY*SEASON, zi=~SEASON, data=quinn, family=nbinom2())
quinn.resid <- quinn.zinb %>% simulateResiduals(plot=TRUE) df AICc
quinn.zip 9 326.1881
quinn.zip1 12 330.7988
quinn.zinb 13 309.2125
Family: nbinom2 ( log )
Formula: RECRUITS ~ DENSITY * SEASON
Zero inflation: ~SEASON
Data: quinn
AIC BIC logLik deviance df.resid
296.2 318.8 -135.1 270.2 29
Dispersion parameter for nbinom2 family (): 11
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.3026 0.1784 12.907 < 2e-16 ***
DENSITYLow 0.1133 0.2605 0.435 0.66355
SEASONSummer 1.5721 0.2246 7.000 2.56e-12 ***
SEASONAutumn 0.6763 0.2355 2.872 0.00408 **
SEASONWinter -0.5680 0.2764 -2.055 0.03988 *
DENSITYLow:SEASONSummer -0.8970 0.3305 -2.714 0.00665 **
DENSITYLow:SEASONAutumn -0.1881 0.3577 -0.526 0.59899
DENSITYLow:SEASONWinter 0.2130 0.5887 0.362 0.71757
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -20.772 9771.312 -0.002 0.998
SEASONSummer -6.012 189293.632 0.000 1.000
SEASONAutumn -5.942 200192.420 0.000 1.000
SEASONWinter 19.507 9771.312 0.002 0.998